Estimating Species Trees from Quartet Gene Tree Distributions under the Coalescent Model

نویسنده

  • Martin Kreidl
چکیده

In this article we propose a new method, which we name ‘quartet neighbor joining’, or ‘quartet-NJ’, to infer an unrooted species tree on a given set of taxa T from empirical distributions of unrooted quartet gene trees on all four-taxon subsets of T . In particular, quartet-NJ can be used to estimate a species tree on T from distributions of gene trees on T . The quartet-NJ algorithm is conceptually very similar to classical neighbor joining, and its statistical consistency under the multispecies coalescent model is proven by a variant of the classical ‘cherry picking’-theorem. In order to demonstrate the suitability of quartet-NJ, coalescent processes on two different species trees (on five resp. nine taxa) were simulated, and quartet-NJ was applied to the simulated gene tree distributions. Further, quartet-NJ was applied to quartet distributions obtained from multiple sequence alignments of 28 proteins of nine prokaryotes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topological metrizations of trees, and new quartet methods of tree inference

Topological phylogenetic trees can be assigned edge weights in several natural ways, highlighting different aspects of the tree. Here the rooted triple and quartet metrizations are introduced, and applied to formulate novel fast methods of inferring large trees from rooted triple and quartet data. These methods can be applied in new statistically consistent procedures for inference of a species...

متن کامل

Gene tree distributions under the coalescent process.

Under the coalescent model for population divergence, lineage sorting can cause considerable variability in gene trees generated from any given species tree. In this paper, we derive a method for computing the distribution of gene tree topologies given a bifurcating species tree for trees with an arbitrary number of taxa in the case that there is one gene sampled per species. Applications for g...

متن کامل

Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies

Species tree reconstruction is complicated by effects of incomplete lineage sorting, commonly modeled by the multi-species coalescent model (MSC). While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this article, we propose a fast algori...

متن کامل

Coalescent methods for estimating phylogenetic trees.

We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to...

متن کامل

Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. Thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011